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Abstract. Terms are a concise representation of tree structures. Since they can be nat- 
urally defined by an inductive type, they offer data structures in functional programming 
and mechanised reasoning with useful principles such as structural induction and struc- 
tural recursion. However, for graphs or "tree-like" structures - trees involving cycles and 
sharing - it remains unclear what kind of inductive structures exists and how we can faith- 
fully assign a term representation of them. In this paper we propose a simple term syntax 
for cyclic sharing structures that admits structural induction and recursion principles. We 
show that the obtained syntax is directly usable in the functional language Haskell and 
the proof assistant Agda, as well as ordinary data structures such as lists and trees. To 
achieve this goal, we use a categorical approach to initial algebra semantics in a presheaf 
category. That approach follows the line of Fiore, Plotkin and Turi's models of abstract 
syntax with variable binding. 



1. Introduction 

Terms are a convenient, concise and mathematically clean representation of tree struc- 
tures used in logic and theoretical computer science. In the fields of traditional algorithms 
and graph theory, one usually uses unstructured representations for trees, such as pairs 
{V, E) of vertices and edges sets, adjacency lists, adjacency matrices, pointer structures, 
etc. Such representations are more complex and unreadable than terms. We know that 
term representation provides a well-structured, compact and more readable notation. 

However, consider the case of a "tree-like" structure such as that depicted below. 

This kind of structure - graphs, but almost trees involving 
(a few) exceptional edges - quite often appears in logic and 
computer science. Examples include internal representations 
of expressions in implementations of functional languages that 
share common sub-expressions for efficiency, data models of 
XML such as trees with pointers |CGZ05] . proof trees ad- 
mitting cycles for cyclic proofs |Bro05j . term graphs in graph 
rewriting |BvEG"'"87| IAK96J , and control flow graphs of imperative programs used in static 
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analysis and compiler optimizations |CFRJl91j. Suppose that we want to treat such struc- 
tures in a pure functional programming language such as Haskell, Clean, or a proof assistant 
such as Coq, Agda |Nor07| . In such a case, we would have to abandon the use of naive 
term representation, and would instead be compelled to use an unstructured representation 
such as {V,E), adjacency lists, etc. Furthermore, a serious problem is that we would have 
to abandon structural recursion and induction to decompose them because they look "tree- 
like" but are in fact graphs, so there is no obvious inductive structure in them. This means 
that in functional programming, we cannot use pattern matching to treat tree-like struc- 
tures, which greatly decreases their convenience. This lack of structural induction implies a 
failure of being an inductive type. But, are there really no inductive structures in tree-like 
structures? As might be readily apparent, tree-like structures are almost trees and merely 
contain finite pieces of information. The only differences are the presence of "cycles" and 
"sharing" . 

In this paper, we give an initial algebra characterisation of cyclic sharing tree structures 
in the framework of categorical universal algebra. The aim of this paper is to derive the 
following practical goals from the initial algebra characterisation. 

[I] To develop a simple term syntax for cyclic sharing structures that admits structural 

induction and structural recursion principles. 
[II] To make the obtained syntax directly usable in the current functional languages and 
proof assistants, as well as ordinary data structures, such as lists and trees. 
The goal IT] requires that the term syntax exactly represents cyclic sharing structures (i.e. 
no junk terms exist) to make structural induction possible. The goal In] requires that the 
obtained syntax should be lightweight as possible, which means that e.g. well-formedness 
and equality tests on terms for cyclic sharing structures should be fast and easy, as are 
ordinary data structures such as lists and trees. We do not want many axioms to characterise 
the intended structures, because, in programming situation, checking the validity of axioms 
every time is expensive and makes everything complicated. Ideally, formulating structures 
without axioms is best. Therefore, the goal In] is rephrased more specifically as: 
[ir] To give an inductive type that represents cyclic sharing structures uniquely. We 
therefore rely on that a type checker automatically ensures the well-formedness of 
cyclic sharing structures. 
To show this, we give concrete definitions of types for cyclic sharing structures in two 
systems: a functional programming language Haskell and a proof assistant Agda. 

1.1. Variations on initial algebra semantics. The initial algebra semantics models syn- 
tax/datatype as the initial algebra and semantics as another algebra, and the compositional 
interpretation by the unique homomorphism. The classical formulation of initial algebra 
semantics for syntax/datatype taken by ADJ [GTW76 ] is categorically reformulated as an 
initial algebra of a functor in the category Set of sets and functions |Rob02] , which means 
that carriers are merely sets and operations are functions. 

Recently, varying the base category other than Set, initial algebra semantics for alge- 
bras of functors has proved to be useful framework for characterisation of various math- 
ematical and computational structures in a uniform setting. We list several: S'-sorted 
abstract syntax is characterised as initial algebra in Set |Rob02] . second-order abstract 
syntax as initial algebra in Set |FPT99l IHam04[ IFioOSj (where F is the category of fi- 
nite sets), explicit substitutions as initial algebras in the category [Set, Set] j of finitary 
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functors |GUH06) . recursive path ordering for term rewriting systems as algebras in the 
category LO of Hnear orders |Has02j . second-order rewriting systems as initial algebras in 
the preorder- valued functor category Pre |Ham05] . and nested datatypes |GJ07] and gen- 
eralised algebraic datatypes (GADTs) [J G08| in functional programming as initial algebras 
in [C,C] and [|C|,C], respectively, where C is a w-cocomplete category. 

This paper adds a further example to the list given above. We characterise cyclic sharing 
structures as an initial algebra in the category (Set ) , where T is the set of all "shapes" 
of trees and T* is the set of all tree shape contexts. We derive structural induction and 
recursion principles from it. An important point is that we merely use algebra of functor to 
formulate cyclic sharing structures, i.e. not (models of) equational specifications or (S,£J)- 
algebras. This characterisation achieves the requirement of "without axioms". Moreover, 
it is the key to formulate them by an inductive type. 

1.2. Basic idea. It is known in the field of graph algorithms |Tar72j that, by traversing a 
rooted directed graph in a depth-first search manner, we obtain a depth-first search tree, 
which consists of a spanning tree (whose edges are called tree edges) of the graph and forward 
edges (which connect ancestors in the tree to descendants), back edges (the reverse), and 
cross edges (which connect nodes across the tree from right to left). 

Forward edges can be decomposed into tree and cross 

..-■■"" /\^ edges by placing indirect nodes. For example, the graph in 

X^ ^^ the front page becomes a depth-first search tree in FigjTJwhere 

/\ solid lines are tree edges and dashed lines are back and cross 

i y^ v. edges. This is the target structure we will model in this pa- 

/\. per. That is, tree edges are the basis of an inductive structure, 

"■■D © back edges are used to form cycles, and cross edges are used 

to form sharing. Consequently, our task is to seek how to 
Figure 1: Depth- first characterise pointers that make back edges and cross edges in 

search tree inductive constructions. 

1.3. Formulation. The crucial idea to formulate pointers in inductive constructions is 
to use binders as pointers in abstract syntax. Trees are formulated as terms. Hence, a 
remaining problem is how to exactly capture binders in terms. Fiore, Plotkin and Turi 
|FPT99] have characterised abstract syntax with variable binding by initial algebras in the 
presheaf category Set , where F is the category of finite sets. For example, abstract syntax 
of A-terms is modeled as a functor 

A : F ► Set 

equipped with three constructors for A-terms as an algebra structure on A. Each set A(X) 
gives the set of all A-terms which may contain free variables taken from a set X in F. This 
formulation models a structure (here, abstract syntax trees) indexed by suitable invariant 
(here, free variables considered as contexts), which is essential information to capture the 
intended structure (abstract syntax with variable binding). 

However, this approach using algebras in Set is insufficient to represent "cross edges" 
in tree-like graphs. Ariola and Klop [AK96) have analysed that there are two kinds of 
sharing in this kind of tree-like graphs: 
(i) vertical sharing (i.e. back edges in depth-first search trees), and 
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(ii) horizontal sharing (i.e. cross edges). 
In principle, binders capture "vertical" contexts only, but to represent cross edges exactly, 
we must capture "horizontal" context information that cannot be handled by the index 
category F. 

To solve this problem, in this paper we take a richer index category that is enough to 
model cross edges. We introduce the notion of shape trees and contexts consisting of them, 
which represents other parts of tree viewing from a pointer node. We use the set T of all 
shape trees as "types" of syntax, and the set T* of all sequences of shape trees as "context" . 
We follow Fiore's treatment of initial algebra semantics for typed abstract syntax with 
variable binding |Fio02] by algebras in the presheaf category (Set ) . Therefore, cyclic 
sharing trees are modelled as a T and T*-indexed set 

T : T (T* Set) 

equipped with constructors of cyclic sharing trees as an algebra structure. 

1.4. Organisation. We first give types and abstract syntax for cyclic sharing binary trees 
in Section |2.2[ We then characterise cyclic sharing binary trees as an initial algebra in 
Section |3j Section |4] gives a way of implementing cyclic sharing structures by inductive 
types. Section [5] generalises our treatment to arbitrary signature for cyclic sharing struc- 
tures. Section |6] presents discussion of variations of the form of pointers in cyclic sharing 
trees. Section [7] relates our representation and equational term graphs in the initial algebra 
framework by giving a homomorphic translation. In Section [8j we discuss connections to 
other approaches to cyclic sharing structures. 

2. Abstract Syntax for Cyclic Sharing Structures 

2.1. Cyclic structures by /i-terms. The /i-notation (/u-terms) for fixed point expressions 
is widely used in computer science and logic. Its theory has been investigated thoroughly, for 
example, in |AK96t ISPOOj . The language of /x-terms suffices to express all cyclic structures. 
For example, a cyclic binary tree shown in Fig. [2] (i) is representable as the term 

/ix.bin(^yi.bin(lf (5), lf(6)), My2.bin( x, lf(7) )) (2.1) 

where bin and If respectively denote a binary node and a leaf. The point is that the variable x 
refers to the root labeled by a /i-binder, hence a cycle is represented. To uniquely formulate 
cyclic structures, here we introduce the following assumption: we attach ^u-binders in front 



of bin only, and put exactly one /u-binder for each occurrence of bin as for (2.1). This is 
seen as uniform addressing of bin-node, i.e., x,yi,y2 are seen as labels or "addresses" of 
bin-nodes. We also assume no axiom to equate /i-terms. That is, we do not identify a 
/x-term with its unfolding, since they are different (shapes of) graphs. In summary, /i-terms 
represent cyclic structures. This is the underlying idea of a representation of cyclic data 
given in |GHUV0"6] by using the functional programming language Haskell. 
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Figure 2: Trees involving cycle and sharing 



2.2. Ho'w to represent sharing. Next, we incorporate sharing. The presence of sharing 
makes the situation more difficult. Consider the tree (ii) in Fig. ^ involving sharing via a 
cross edge. As similar to the case of cycles, this might be written as a yU-term 



,lf(7))). 



/ux.bin(/iyi.bin(lf(5), lf(6)), /iy2-bin( 

But can we fill the blank to refer the node a (in Fig. ^ (ii)) from the node c "horizontally" 
by using the mechanism of binders? Actually, /x-binders are insufficient for this purpose. 
Therefore, we introduce a new notation "i/pt^^" to refer to a node horizontally. This 
notation means going up to a node x labelled by a //-binder and going down to a position p 
in the subtree rooted by the node x. In the example presented above, the blank is filled as 

//x.bin(/xyi.bin(lf(5),lf(6)),/iy2.bin(^lltx,lf(7))). 

The pointer ^11 fx means going back to the node x, then going down through the left 



child twice (using the position 11). See also Example 2.1 In this section, we focus on the 
formulation of binary trees involving cycles and sharing. Binary trees are the minimal case 
that can involve the notion of sharing in structures. Later, in Section [5j we will consider 
general data types. 



2.3. Shape trees. We designate our target data structures as cyclic sharing trees and 
its syntactic representation cyclic sharing terms. Cyclic sharing trees are binary trees 
generated by nodes of three kinds, i.e., pointer, leaf, and binary node, and satisfying a 
certain condition of well-formedness. 

To ensure correct sharing, we introduce the notion of shape trees, which are skeletons 
of cyclic sharing trees. That is, shape trees are binary trees, forgetting values in pointer 
nodes and leaves from cyclic sharing trees. The set T of all shape trees is defined by 

T 9 T ::= E I P I L I b{ti,T2) 

where E is the void shape, P is the pointer node shape, L is the leaf node shape, and b{ti,T2) 
is the binary node shape. We typically use Greek letters a, r to denote shape trees. 

We define referable positions in a shape tree. A position is a finite sequence of {1,2}. 
The root position is denoted by the empty sequence e and the concatenation of positions is 
denoted by pq or p.q. The set Vos{t) of referable positions in a shape tree r is defined by 

Vos{e) = Vos{p) = 

ros{L) = {e} 

Vos{B{a, r)) = {e} U {Ip \ p £ Vosia)} U {2p \ p £ Vos{t)}. 
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An important point is that the void E and the pointer p nodes are not referable by other 
nodes, hence their positions are defined to be empty sets. 

2.4. Syntax and types. Shape trees are used as types in a typing judgment. As usual, a 
typing context T is a sequence of (variable, shape tree)-pairs. 



Typing rules 



m ■ 4- \ P^ Vos{a) ke 

(Pomter)- ^, ^ , . " (Leaf) 



(Node; 



T,x:a,T' h ^p\x : P ' 'TV- If (A;) : L 

X : b(e, e), r h s : cr x : B(cr, e), T h t : r 
r h ;ux.bin(s,t) : b((T,t) 



In these typing rules, a shape tree type is assigned to the corresponding tree node. 
That is, a binary node is of type B(cr, r) of binary node shape, a pointer node is of type P 
of pointer node shape, and a leaf node is of type L of leaf node shape. 

A type declaration x : a in a typing context (roughly) means that a is the shape of 



subtree (say, t) headed by a binder ^ux (see Example 2.1). Hence, in (Pointer) rule, taking 
a position p G T'os{a), we safely refer to a position in the tree t. The notation j/pfx is 
designed to realise a right-to-left cross edge. Note also that a path obtained by i/p'lx is 
the shortest path from the pointer node to the node referred by t/p\x. When p = e, we 
abbreviate ^/e^x as t^;. This '[x exactly expresses a back edge. In (Node) rule, the shape 
trees b(e,e) and b(o",e) mask nodes that are reachable via left-to-right references (i.e. not 
our requirement) or redundant references (e.g. going up to a node x then going back down 
through the same path) by the void shape E. 

Example 2.1. The binary tree involving sharing in Fig. p^(ii) is represented as a well- typed 
term 

//x.bin(;uyi.bin(lf(5), lf(6)), /iy2.bin(^lltx, lf(7))). 

Its typing derivation is the following. 

11 e Vos{(3) 
yi:a,a;:a I- lf(5) : L yi:B(L, E),a;:a h lf(6) : L y2:a,x:P h ^/llfx : P y2:B(p,E), a;:/3 h lf(7) : L 
x:a h /i2/i.bin(lf(5), lf(6)) : b(l,l) x:(3 h ^y2.bin(^llta;, lf(7)) :b(p,l) 

h/^x.bin(/i?;i.bin(lf(5,lf(6)),/zy2-bin(^llta;,lf(7))) : b(b(l,l), b(p,l)) 

where a = b(e, e), /3 = b(b(l, l), e). 

As a result, we can ensure that no dangling pointer happens in this type system. 

Theorem 2.2 (Safety). // a closed term \- t : t is derivable, any pointer in t points to a 
node in t. 

Proof. By the typing rules, it is obvious that a variable x in a pointer in the resulting t is 
always taken from a //-binder in t. Looking at (Node) rule from the lower to the upper, the 
shape B((T,r) is always decomposed, and two shape trees in typing contexts at the upper 
contain fewer possible positions than the lower, i.e. 

Pos(b(e,e)) C Pos(b(o-,e)) C Pos(B(cr,T)). 
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This means that at any apphcation of (Pointer) rule at the topmost of a typing derivation 
tree, a taken position p is included in the positions of a sub-shape tree of the type of t at 
the bottom of the typing derivation. D 

2.5. De Bruijn version. Instead of named variables for binders, a de Bruijn notation is 
also possible. The construction rules are reformulated as follows. Now a typing context T is 
simply a sequence of shape trees ti, . . . , r„. Let |r| denote its length. A judgment T \- t : t 
denotes a well-formed term t of shape r containing free variables (de Bruijn indices) from 
1 to |r|. The intended meaning is that the length |r| denotes how many maximally we can 
go up from the current node t, and each shape tree tj in T denotes the shape of the subtree 
at i-th. upped node from t. Consequently, when t is a pointer, a context specifies the set of 
all positions to which a pointer node can refer. 

As known from A-calculus, using de Bruijn notation, binders become nameless. There- 
fore we can safely omit "x" from /ix. Because the typing rules are designed to attach 
exactly one ^-binder for each bin, even "^u" can be omitted. As a result, we obtain a 
simplified construction rules of terms. 



Typing rules (de Bruijn version) 

(dbPointer)^l^-Lzi_Z^^^£^ (dbLeaf) ^^^ 



(dbNode 



r, a, r h ^pti : P ' T h If (A;) : L 

b(e, e), r \- s : a b((T, e), r \- t : T 



r h bin(s,i) : B(cr,T) 



In the (dbPointer) rule, the condition |r| = i — 1 states that the shape tree a appears 
at i-th position of the typing context in the lower judgment. Because its depth- first search 
tree is unique for a given graph, the following is immediate. 

Theorem 2.3 (Uniqueness). Given a rooted graph that is connected, directed and edge- 
ordered with each node having out-degree at most 2, the term representation in de Bruijn is 
unique. 

Remark 2.4. This uniqueness of term representation has practical importance. For in- 
stance, for the graph in the tree (ii) in Fig. ^ there is only one way to represent it in this 
term syntax, i.e., bin(bin(lf(5, lf(6)), bin(i/llt2, lf(7))) in de Bruijn. Therefore, we do not 
need any complex equality on graphs (other than the syntactic equality) to check whether 
given data are the required data. This cont rasts direc tly to other approaches. If we rep- 
resent a graph as a term graph with labels [ BvEG"'"87] . an equational term graph |AK96j . 
or a letrec-term |Has97j . then several syntactic representations exist for a single graph. 
Therefore, some normalisation is required, for instance when defining a function on graphs. 
Generally speaking, our terms are regarded as "de Bruijn notation" of term graphs with 
labels [BvEG+87| . 
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3. Initial Algebra Semantics 



In this section, we show that cychc sharing terms form an initial algebra and derive 
structural recursion and induction from it. 

3.1. Construction. We use Fiore's approach to algebras for typed abstract syntax with 
binding |Fio02| in the presheaf category (Set -1^)^ where U is the set of all types. Now, 
we take the set T of all shape trees for [/, and the set N of natural numbers for variables 
(i.e. pointers), instead of the category F of finite sets and all functions (used for renaming 
variables), because we do not need renaming of pointers. 

We define the discrete category T* by taking contexts T = (ri, . . . , t„) as objects (which 
is equivalent to N|T). We also regard T as a discrete category. We consider algebras in 
(Set ) . Two preliminary definitions are required. We define the presheaf PO G Set for 
pointers by 

PO((ri,...,r„))={^pt^ I l<i<n, p G Pos(r,)}. 
For each r G T, we define the functor 5r '■ Set >- Set for context extension by 

5rA = A{{T,-)). 

We define the signature functor S : (Set ) >- (Set ) for cyclic sharing binary 

trees, which takes A G (Set ) and a type in T, and gives a presheaf in Set , as follows: 

(SA)e=0 (S^)p=PO (SA)L=i^Z (S^)b(.,.) = <^B(E,E)^<x X (5B(<,,E)^r 

where Ki is the constant functor to Z, and is the empty set functor. A Ti-algebra yl is a 
pair {A, a) consisting of a presheaf A G (Set )'^ for a carrier and a natural transformation 
a : 'SA — )• A for an algebra structure. By definition of S, to give an algebra structure is to 
give the following morphisms of Set : 

A homomorphism <j) of S-algebras from (A, a) to (B, f3) is a morphism (j) : A ^ B such that 
(j) o a = 13 o Ti(j). 

Let T be the presheaf of all derivable cyclic sharing terms defined by 

r^(r) = {t I r h t : r}. 

Theorem 3.1. For the signature functor S for cyclic sharing binary trees, T forms an 
initial Ti-algebra. 

Proof. Since 5r preserves w-colimits, so does S. An initial S-algebra is constructed by the 
colimit of the w-chain — )• SO — )• S^O —)•••• |SP82j . These construction steps correspond 
to derivations of terms by typing rules, hence their union T is the colimit. The algebra 
structure in : ST — ?• T of the initial algebra is obtained by one-step inference of the typing 
rules, i.e., given by the following operations 

ptr^(r) : PO(r) -^ rp(r) if'^(r) : z -^ t^{t) 

■/pti ^ -/pti k i-> If (A;) 

bin^(r) : r,(B(E,E),r) x r,(B(a,E),r) ^ rB(,,,)(r); s,t^b\n{s,t). 

n 
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The set Tt-(()) is the set of ah complete (i.e. no dangling pointers) cyclic sharing trees 
of a shape r. 

This development of an initial algebra characterisation follows the line of |FPT99l IFio02| 
IMS03| . Therefore, we can further develop a full theory of algebraic models of abstract syn- 
tax for cyclic sharing structures along this line. It will provide second-order typed abstract 
syntax with object/meta-level variables and substitutions via a substitution monoidal struc- 
ture and a free S-monoid fH am041 IFioOS] in (Set ) (by incorporating suitable arrows into 
T*). Object/meta-substitutions on cyclic sharing structures will provide ways to construct 
cyclic sharing structures from smaller structures in a sensible manner. But this is not the 
main purpose of this paper. Details will therefore be pursued elsewhere. 

3.2. Structural recursion principle. An important benefit of initial algebra characteri- 
sation is that the unique homomorphism from the initial to another algebra is a mapping 
defined by structural recursion. 

Theorem 3.2. The unique homomorphism (f) from, the initial 'E-algebra T to a Ti-algebra 
A is described as 

M^)Upti) = ptr^{r)Upri) 

Mmm) = if^(r)(fc) 

,/.3(,,,)(r)(bin(s,t)) = bin^(r)(</.,(B(E,E),r)(s), (/.,(B(a,E),r)(t)). 

Proof. Since the unique homomorphism : T A is a morphism of (Set )'^ . Q 

Example 3.3. We give examples of functions on T defined by structural recursion. 

(i) The function leaves that collects all leaf values in a cyclic sharing tree t G ?V(r): 

leaves : T >- K-p^j;^ 

leavesp(r)(^/ptO = ^ 

leavesL(r)(lf(A:)) = {k} 

leavesB(o-,r)(r)(bin(s,t)) = leaveScr(B(E, E),r)(s) U leaveSr(B((T,E),r)(t). 

This is because leaves is the unique homomorphism from T to a S-algebra i^-p(z) (the 
constant bifunctor to the power set of integers Z) whose operations are given by 

ptr^p(z)(r)(^pt^) = 

|f^p(z)(r)(A;) = {k} 

bin'^^(2)(r)(a;,y) = x\Jy. 

(ii) The function height that computes the height of a cyclic sharing tree t: 

height : T ► K^ 

heightp(r)(^ptO = 1 

height,(r)(lf(fc)) = 1 

height3(,^,)(r)(bin(s, t)) = max(height^(B(E, e), r)(s), height,(B(a, e), T){t)) + 1 
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where max is the maximum function on 7L. This is because height is the unique homomor- 
phism from T to a S-algebra Kj^ whose algebra structure is the obvious one. Notice that 
the height is not so directly defined in ordinary graph representations. 

(iii) The function skeleton that computes the shape of a given cyclic sharing tree t: 

skeleton : T ► Kf 

skeletonp(r)(>/ptO = P 

skeletonL(r)(lf(/c)) = L 

skeletoni3(^ ,^)(r)(bin(s, t))= B(skeleton^(B(E, e), r)(s), skeleton^(B(cr, e), r)(i)). 

From an algorithmic perspective, the structural recursion principle provides depth-first 
search traversal of a rooted graph. Consequently, graph algorithms based on depth-first 
search are directly programmable using this structural recursion. On the author's home page 
(http://www.cs.gunma-u.ac.jp/~hamana/), several other simple graph algorithms have 
been programmed using structural recursion. 

3.3. Structural induction principle. Another important benefit of initial algebra char- 
acterisation is the tight connection to structural induction principle. To derive it, following 
|H,T98[[Jac99] . we use the category Sub((Set'^*)'^) of predicates on (Set"^*)"^ defined by 

• objects: sub-presheaves (P ^-s- f/), i.e., inclusions between P, [/ G (Set )^, 

• arrows: u : (Q "^ y) — t- (P ^-)- f7) are natural transformations u : V ^ U between 
underlying presheaves satisfying a G Qt^^ implies u{a) G Pr(r) for all r G T,r G T*. 

A sub-presheaf (P ^-t- T) is seen as a predicate P on cyclic sharing terms T, which is indexed 
by types and contexts. So, we say "P^(t) holds" when i G Pr(r) for ^ G ^T(r)- 

We consider S-algebras in Sub((Set ) ) by "logical predicate" lifting JHJ98J of algebras 
in (Set ) . Why this is lifting is that now we consider the functor p : Sub((Set ) ) — )• 
(Set ) sending (P ^-)- [/) to the underlying presheaf U . Then we lift the functor S to 
Spred in a commuting diagram 

Sub((Set^*r) ^^ Sub((Set^*r) 



V 



V 



T*\T ^C!„j-T*\T 



(Set-)^ ^- (Set-) 

by induction on the structure of S: 

(Spred(P -^ f/))p = (PO ^ PO) 

(Spred(P -^ f/))E = (0-^0) 

(Spred(P -^ f/))L = {K^ ^ Kz) 

(Spred(P ^ t^))B(<x,r) = '5b(e,e)(^ ^ U)^ X 5b(^,e)(^ ^ U)r 

where we also lift the context extension to 6r '■ Sub(Set ) — t- Sub(Set ) defined by 

5AA^B) = iA{T,-)^B{T,-)). 
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A Spi-ed-algebra structure a : SprodC-P M- T) — )• (P M- T) can be read as the induction 
steps in a proof by structural induction. For example, the operation in Sub((Set )'^) 

bin'^'^^(r) : (p,(B(E,E),r) -^r,(B(E,E),r)) x (p,(B(a,E),r) -^r,(B(a,E),r)) 

s, 1 1-> bin(s, t) 

means that "if P^'''^^'^\s) k Pr'''^^'^\t) holds, then P^^^^^^{b\n{s,t)) holds." 

Jacobs showed that if a fibration £ ^- B satisfies several conditions (having fibered 
(co)products, etc.), then the logical predicate lifting from B to £ preserves initial algebras 
(Prop. 9.2.7 in j.Tac99) V The functor p : Sub((Set''*)'') -^ (Set'^*)'^ is actually such a 
fibration. Consequently, because T is an initial S-algebra, (T ^^ T) is an initial Spred- 

algebra. The unique homomorphism (j) : (T ^^ T) «- {P ^^ T) means that P holds for 

all cyclic sharing terms in T. Hence 

Theorem 3.4. Let P be a predicate on T. To prove that P^{t) holds for all t G Tr{T), it 
suffices to show 
(i) P^Upti) holds for all ^p-\i G PO(r), 

(ii) Pf{\^{k)) holds for all keZ, 

(iii) ifP^^^'^^'^{s)kPr^'''^^'^{t) holds, i/ienPr^,^)(bin(s,t)) holds. 

This structural induction principle is useful to prove properties of functions on cyclic 
sharing terms defined by structural recursion. As an example, we show the following simple 
property of the function skeleton defined in Example |3.3[ 

Proposition 3.5. For all t G TriT), skeltonr(r)(t) = r. 

Proof. By structural induction on t. 

(i) Case t = y/p\i G PO(r). By definition, skeltonp(r)(^pt«) = P- 
(ii) Case t = lf(A;). By definition, skeltonL(r)(lf(A;)) = L. 
(iii) Case t = bin(si,S2). Then, 

skeletonB(o- ,^)(r)(bin(si, S2)) = B(skeletona-(B(E, e), r)(si), skeletonT-(B (cr, e), r)(s2)) 

= B(cr,T) by induction hypothesis. D 

4. Inductive Types for Cyclic Sharing Structures 

In this section, we achieve our goal [n] to give inductive types for cyclic sharing struc- 
tures. We give implementations in two different systems. We first use the functional lan- 
guage Haskell for an implementation because 
(i) we show that our characterisation of cyclic sharing is available in today's programming 

language technology, and 
(ii) Haskell's type system is powerful enough to implement our initial algebra characteri- 
sation faithfully. 
Secondly, we give an implementation by dependent types in the proof assistant Agda. 
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4.1. A GADT definition in Haskell. Because the set ^^(r) of cyclic sharing terms 
depends on a shape tree and context, it should be implemented as a dependent type. 



We have seen in the proof of Theorem 3.1 that constructors of cyclic sharing terms are 
T and T*-indexed functions. Inductive types defined by indexed constructors have been 
known as inductive families in dependent type theories |Dyb94 J . Recently, the Glasgow 
Haskell Compiler (GHC) incorporates this feature as GADTs (generalised algebraic data 
types) |PVWW06] . Using another feature called type classes, we can realise lightweight 
dependently- typed programming in Haskell [McB02j . 

We will implement ^^(r) as a GADT "T n t" that depends on a context n (for F) and 
a shape tree t (for r). In Haskell, a type can only depend on types (not values). For that 
reason, we firstly define type-level shape trees by using a type class. 



DnR b I StopB 



data 


E 




data 


P 




data 


L 


= StopLf 


data 


B a 


b = DnL a 1 


clas 


3 Shape t 


inst< 


iiice 


Shape E 


inst< 


iiice 


Shape P 


inst< 


iiice 


Shape L 


inst< 


iiice 


(Shape s, S 



Shape t) => Shape (Est) 

These define constructors of shape trees as types E,P,L and a type constructor B, then group 
them by the type class Shape. Values of a shape tree type r are defined by Vos{t), i.e. 
"referable positions" in r. For example, consider a shape tree b(b(l,l),l). The position 
1 • 2 in this shape tree is coded as the well-typed term DnL (DnR StopLF) : : B (B L L) L 
where StopLf means stopping at a leaf. 

Similarly, a typing context (ri, . . . , t„,) is coded as a type-level sequence 

TyCtx Ti (TyCtx T2 • • • (TyCtx r„ TyEmp)) 

and the type constructors are grouped by the type class Ctx. Values of a context type are 
"pointers" (e.g. (Up UpStop) meaning t2). 

data TyEmp 

data TyCtx t n = Up n I UpStop I UpGD t 

class Ctx n 

instance Ctx TyEmp 

instance (Shape t, Ctx n) => Ctx (TyCtx t n) 

Finally, we define the set T't-(F) as a GADT "T" that takes a context and a shape tree as 
two arguments of the type constructor T. 



data T 
Ptr 
Lf 
Bin 



: *->*->* where 
Ctx n => n -> T n P 
Ctx n => Int -> T n L 
(Ctx n, Shape s, Shape t) => 
T (TyCtx (B E E) n) s -> T (TyCtx (B s E) n) t -> T n (B s t) 



This defines three constructors of cyclic sharing terms faithfully. Note that the part 
"Ctx n =>", called a context of a type class, is a quantification meaning that "for every 
type n which is an instance of the type class Ctx" . 
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For example, the term in Example 2.1 is certainly a well- typed term; its type is inferred 



in the GHC (by invoking the command :t in the interpreter) 

Bin (Bin (Lf 5) (Lf 6)) 

(Bin (Ptr (Up (UpGD (DnL (DnL StopLf))))) (Lf 7)) 
: : T TyEmp (B (B L L) (B P D) 

The term Up (UpGD (DnL (DnL StopLf))) is the representation of the pointer y/\l'[2 in 
de Bruijn notation, which is read from the top as "going up and up, then going down (GD 
is short for going down) to the position 11 and stopping at a leaf". The type inference and 
the type checker automatically ensure well-formedness of cyclic sharing terms. 

In Haskell, we can equally use the GADT T as an ordinary algebraic datatype. There- 



fore, we can define a function on it by structural recursion as described in Example 3.3 (even 
simpler; shape tree and context parameters are unnecessary in defining functions because of 
Haskell's compilation method |P VWW06| ) . The implementation and additional examples 
using the GADT T are available from the author's home page. 

4.2. A dependent type definition in Agda. Secondly, we consider a definition in a proof 
assistant/dependently-typed programming language Agda |Nor07j . There are several ways 
for implementation. One way is to use so-called universe construction |OS08j by defining 
decoding functions from type names to actual types to mimic the type class mechanism used 
in the previous subsection. The resulting definition might resemble the Haskell version. 
Another way is more natural to use the full power of dependent types in Agda. In this 
subsection, we take this approach. We implement the initial algebra T of cyclic sharing 
tree structures as a dependent type that depends on two values (not types as in Haskell), 
a shape tree and a context. 

We maximally use Agda's notational advantage, which allows Unicode for mathematical 
symbols in a program. In the following Agda code, we use mathematical symbols we have 
used in the paper to the greatest degree possible (but it is certainly a real Agda code, not 
a pseudo-code) . 

We define shape trees as a usual inductive type, and contexts as the type of sequences 
of shape trees (where ■ is the empty context and "," is the separator): 

data Shape : Set where 



E 


Shape 


P 


Shape 


L 


Shape 


B 


Shape 



Shape — )• Shape 

data Ctx : Set where 

■ : Ctx 

_ , _ : Shape — )• Ctx — )• Ctx 
The type Pos t for positions of a shape tree r is defined naturally as an Agda's inductive 
family, which consists of indexed constructors. The style of definition is almost identical to 
that of GADTs in HaskeU. 
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e 

DnL 

DnR 



data Pos : Shape — )• Set where 
PosL 
Vjcrr} -^ Pos (Bar) 

V {cr r} -^ Pos a -^ Pos {B a r) 

V {cr r} -^ Pos T -^ Pos {B a r) 

To define the dependent type T for cychc sharing trees, the crucial ingredient is the imple- 
mentation of the presheaf PO for pointers, recalling that it was defined by 

PO((ri, . . . ,r„)) = ypti I 1 < i < n, p G Vos{t,)]. 

What we need is to implement a way to pick an index i and tj from a typing context 
concisely. The following type Index does this job. 

data Index : Shape — t- Ctx — )• Set where 

one : V {F r} — t- Index t (t , F) 

s : y {F T a} — )■ Index T F — )• Index t {a, F) 

A well-typed term "i : Index r /^" means "i is the index of a shape tree r in F = 
Ti, . . . , r, . . . , Tn" , e.g., s (s one) : Index t^ (ti , r2 , ts , ■). Then, the presheaf PO is nat- 
urally implemented. 

data PO : Ctx — )• Set where 

^_t- : y{FT} ^ Post ^ Index t F -^ PO{F) 

Using these ingredients, the implementation of the initial algebra T for cyclic sharing trees 



is quite the same as the mathematical definition we obtained in Theorem 3.1 
data T : Ctx — t- Shape — t- Set where 



ptr 

If 
bin 



V{r} -^ PO{F) ^ TFP 

M {F} ^ Int ^ T F L 

V {r cr r} -^ T {B E E , F)a ^ T {B a E , F) t ^ T F {B a t) 



For example, the term in Example 2.1 is a well- typed term also in Agda. 

bin [bin {If 5) (// 6)) {bin {ptr (^ DnL {DnL e)\ s one) {If 7))) : Tm{B{B LL){B P L)) 
Defining a function on the type T by structural recursion is directly possible because of 



Agda's pattern matching mechanism on dependent types. The functions in Example 3.3 
are defined directly. In addition, shape tree and context parameters can be (Agda's feature 
of) implicit arguments. Therefore, we can use such functions concisely by omitting complex 
indices, as in the case of GADTs in Haskell. 

5. General Signature 

We give construction of cyclic sharing structures for arbitrary signatures as a natural 
generalisation of the binary tree case. 

A signature E for cyclic sharing structures consists of a set S of function symbols having 
arities. A function symbol of arity n G N is denoted by /^"'. Each function symbol / has 
an associated shape symbol [/] (typically written in small caps such as b). 

Example 5.1. For the case of cyclic sharing binary trees, the signature S consists of bin ^ ' 
and \^^ ' . Corresponding shape symbols are defined by [bin] = B, [If] = L. 
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The set T of all shape trees is defined by 

T 9 T ::= E I P I [/Kn, . . . ,r„) for each /^ G S. 
The set of all contexts is 

T* ={{Ti,...,Tn) I nEN,iG{l,...,n},TiGT}. 
Positions are defined by 

VosiE) = Pos(p) = 
Pos([/](ri,...,T„)) = {e}U{l.p I peVos{Ti)}U...U{n.p \ p£Vos{Tn)}. 

Typing rules 

|r|=i-l pGVos{a) 7i,rhti:ri ••• 7„,rht„. :r^ /W £ E 

r,a,r' h^pt^:p r h/(ti,...,i„) : r/l(Ti,...,r„) 

where 7^ = [/] (e, . . . , e), 7^+^ = [/] (ti, . . . ,ri, E, . . . ,e) for each I <i <n-l. 

The shape trees 7j's are also used below. 

This general case has the safety and uniqueness properties as well. 

Theorem 5.2 (Safety). // a closed term \- t : t is derivable, any pointer in t points to a 
node in t. 

Theorem 5.3 (Uniqueness). Given a rooted graph that is connected, directed and edge- 
ordered, the term representation is unique. 

Next we provide an initial algebra characterisation. The base category is (Set ) . The 
presheaf PO of pointers is defined the same as in Sect. [3| For a signature S, we associate 
a signature functor S : (Set )^ — t- (Set )^ defined by 

(SA), = (SA)p = PO (S^)p^^(,^,...,,„) = H 6^Ar, for each /(") G S. 

l<i<n 

The following theorems are straightforward generalisations of the corresponding theo- 
rems for the binary tree case; hence proofs are straightforward. 

Theorem 5.4 (Initial algebra). Let T, be a signature. TriT) = {t \ T \- t : t} forms an 
initial Ti-algebra where operations are: 

ptr^(r) : PO(r) ^ rp(r) f{T) -. ni<.<„T,^(7.,r) ^ %^(.,,...,.„)(r) 

/p\i ^ /p\i ti,...,tn ^ f{tl, ..., tn). 

Theorem 5.5 (Structural recursion). The unique homomorphism (p from the initial S- 
algebra T to a T,-algebra A is described as 

'/'p(r)(^PtO = ptr^(r)(/PtO 

%i(r„...,r„)(r)(/(ti,...,tn)) = /^(r)(</<,,(7i,r)(ti),...,,/.,j7„,r)(tn)). 

Theorem 5.6 (Structural induction). To prove that P^{t) holds for all t G TriT), it suffices 
to show 

(i) P^i^pti) holds for all ^p\i G PO(r), 
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(ii) i/ /(") G S and P?;' (tj) holds for alii = l,...,n, then P|!^i^^^ .^^^(/(ti, . . . , t„)) 
holds. 

Moreover, to give a GADT in Haskell and a dependent type in Agda for cyclic sharing 
structures of a given signature is straightforward, along the line of Sect. |4]for the signature 
of binary cyclic sharing trees. 

6. Variations of the Form of Pointers 

We have concentrated up to this point on unique representation of a given rooted graph. 
This has been achieved by imposing the form of pointers in cyclic sharing trees only from 
right to left. In this section, we consider relaxation of this restriction as a variation of the 
theme of the paper. 

Actually, our algebraic framework, algebras in (Set )^, is not only for depth-first 
search trees. Algebras in (Set )^ can model trees with arbitrary pointers, and the form of 
pointers can be controlled by types using shape trees. That is, our framework has sufficient 
fiexibility to represent any form of pointers precisely. In addition, from the application 
perspective, other forms of pointers will be useful. For example, one may need to invert 
pointers in a cyclic sharing tree in some algorithms. In such a case, one needs to use pointers 
from left to right, not only from right to left. 

Syntactically, the form of pointers is determined by shape tree types of function symbols 
and the definition of position function Vos. Consequently, relaxing the restrictions is an 
easy modifications of the previous treatment. Semantically, such variations of signature 
give other algebras of functors in (Set )'^ for trees with pointers of various forms. 

6.1. Left-to-right pointers. First, we consider cyclic sharing trees involving left-to-right 
pointers and not involving right-to-left pointers. An example is the tree (i) in Fig. ^ 
as represented by bin(bin(lf(5), i/21t2), bin(lf(8), lf(7))). This case retains the uniqueness 
property of the representation for a given rooted graph. 

A signature S, types T, contexts T* and positions Vos are defined exactly the same as 
they are in Sect. [5} 



Typing rules 

|r| = z - 1 pe Vosja) 7i,rhti:ri ••• 7„,rht^:r„ /(") £ S 

r,a,r' \-^pti:P r h/(ti,...,t„) : r/l(Ti,...,r„) 

where 7„ = [/] (e, . . . , e), 7^ = [/] (e, . . . , E, tj+i, . . . , r„) for each 1 < i < n - 1. 



The category (Set''^*)''^, a signature functor S : (Set''^*)''^ -^ (Set''^*)''^, and the S-initial 
algebra T are also defined similarly to the definitions in Sect. [5| Associated structural 
recursion and induction follow as well. 

Example 6.1. Consider the case of cyclic sharing binary trees involving left-to-right point- 
ers. The difference from the original typing rules is the case of a binary node. Now, the 
above rule is instantiated as 

b(e, r),r \- s : a b(e, E),r h t : t 
r h b\n{s,t) : B(o-,r) 
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(i) Left-to-right (ii) Botli 1-to-r & r-to-1 (iii) Allowing redundancy (vi) Indirect references 

Figure 3: Trees involving various pointers 



This means that the shape tree type b(e, t) at the left on the upper judgments expresses 
that (a pointer in) s can point to a node in t, whereas the shape tree type b(e, e) at the right 
expresses that (a pointer in) t cannot point to any node in s by masking node information of 
s by the void shape E. Actually, the general typing rule for left-to-right pointers is obtained 
by generalising this observation. 



6.2. Symmetric form of pointers. We can further allow both right-to-left and left-to- 
right pointers. An example is the tree (ii) in Fig. ^represented by bin(bin(lf(5),i/21t 
2), bin(lf(8),^lt2)). The only difference is the typing rules. 



Typing 

|r| = 
r, 

where 7j 
is set as 


rules 

i-1 

= m( 

E). 


p G Vos{a) 
^p\i : P 
Ti, . . . ,rj_i, 


E, 


7i 


r hti 


Ti ■ 


■■ 7n, 


Thtn 


: Tn 


fin) 


GS 


Ti+l, ■ . 


r h/(ti,...,t„): 
. , Tn) for each 1 < i < 


IfMn 

n (i.e. 


. . . , 

only 


Tn) 

i-th 


argument 



A shape tree 7j = [/] (ri, . . . , rj_i, E, tj+i, . . . , Tn) is used to prohibit only redundant 
reference (i.e. going up to an upper node and then going back down through the same path) 
by the void shape E. 

This case has no uniqueness property for a given graph, because for example, a graph 
in Fig. p^ (i) can be represented in two ways (ii) and (iii) using cyclic sharing terms. 

6.3. No restriction of pointers. In addition to the symmetric form of pointers, redun- 
dant references can be allowed. An example is the tree (iii) in Fig. p^ represented by 
bin(bin(lf(5), lf(6)), bin(^2t2, lf(7))). Redundant reference means that the path obtained by 
y^p\i is not the shortest path to the destination. In the case of the tree (iii) in Fig. |3| going 
up to the root and then going down through the same path to the right child. 



Typing rules 

\T\=i-l pdVosja) 7,rhti:Ti ••• 7,rhtn:T„ /W G S 

r,a,F ^^p\i:F T h /(ti, . . . , t„) : [/l (ti, . . . ,r„) 

where 7 = [/] (ti, . . . ,r„). 
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(i) A graph (ii) Using right-to-left pointer (iii) Using left-to-right pointer 
Figure 4: Two representations of a graph 



6.4. Allowing indirect references. Up to this point in the discussion, we have assumed 
that a pointer cannot point to another pointer node. Like the tree (vi) in Fig. ^has been 
prohibited because we aimed to obtain a unique representation for a graph. However, that 
assumption can also be relaxed. This is achieved by merely modifying the definition of Vos 
as 

Pos(p) = {e}. 
The tree (vi) in Fig. [3]is represented by bin(bin(lf(5),^/ltl), bin(^/12t2. If (7))). 

6.5. Pointers from inner nodes. We can allow pointers from inner nodes, i.e., not only 
from leaves as we have considered. This is by introducing a new term construct 

which expresses that an inner node / having n-children also has a pointer slot. Typing rule 
is the combination of the previous term formations for the pointer and function term. 



Typing rule 

iri = i 



1 p G Vos{a) 7, r h ti : Ti 



r,rht„:r„ /Wes 



where 7j 



r, a, r h f{/p\i- ti, . . . , tn) ■■ m (ri, . . . , r, 
[/](Ti,...,rj_i,E,rj+i,...,T„) for each 1 < i < n. 



This form of terms is used as a data model of XML called trees with pointers |CGZ05| 
by Calcagno, Gardner and Zarfaty. 



6.6. Mixing variations. The form of pointers need not be uniform (i.e. all pointers must 
be the same form) as described above. For example, in a single tree, it is possible that some 
function symbols allow only right-to- left pointers, some others allow only left-to-right, and 
some others allow both, etc. This possibility is realised merely by assigning an appropriate 
type to each function symbol, which shows that our framework has expressive power to 
control the form of pointers. 

It is important to note that in any variation of typing rules, the safety property of 



pointers stated in Theorem 5.2 still holds. Truly dangling pointers cannot happen in this 
framework. 
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7. Connection to Equational Term Graphs in the Initial Algebra 

Framework 

We have investigated a term syntax for cyclic sharing structures, which gives a repre- 
sentation of a graph. In this section, we give the converse, i.e., an exphcit way to calculate 
the graph for a given cyclic sharing term. This means to give semantics of a cyclic sharing 
term by a finite graph. We give it using Ariola and Klop's equational term graphs in the 
initial algebra framework. This semantics also clarifies connections to existing works that 
have explored the semantics of cyclic sharing structures. 

Equational term graphs |AK96| are another representation of cyclic sharing trees, 
which have been used in a formulation of term graph rewriting. This is a rep- 
resentation of a rooted graply by associating a unique name to each node and by 
writing down the interconnections through a set of recursive equations. For exam- 
ple, the graph portrayed in Figure [5] is represented as an equational term graph 

{x \ X = bin(yi,y2), y2 = lf(9), 

2/1 =b\n(z,z), z =bin(x,M), 

u = lf(6)}. 

We use this form of equational term graphs, which is called 
flattened form in |AK96| . and which is formally defined as follows 
(NB. it differs slightly from the original syntax to make explicit 
the connection to cyclic sharing terms). 

Suppose a signature S and a set X = {x, xi, . . .} of variables. 
Figure 5: rooted graph An equational term graph is of the form 

{x \ Xi= ti, X2 =t2,...} 

where each tj follows the syntax 

t::=x I ^pli I /(xi,...,x„). 

A variable is called bound if it appears in the left-hand side of an equation; it is called 
free otherwise. We also call ^/p^i a free variable (and regard it as a free variable). 
We assume that any useless equation y = t, where y cannot be reachable from the root, 
is automatically removed in the presentation of equational term graphs |AK96j (hence, 
equational term graphs are always connected and single-rooted). 

We define a translation from a cyclic sharing term to an equational term graph by the 
unique homomorphism from the initial algebra to an algebra consisting of equational term 
graphs. The idea is to use positions as unique variables in an equational term graph. We 
define EGraph^(r) by the set of all equational term graphs having free variables taken from 
PO(r) (where a shape index r is meaningless for equational term graphs, but we just put 
this index to form a presheaf). This EGraph forms a presheaf in (Set )^. Any equational 
term graph can be drawn as a tree- like graph as Fig. [5] by traversing each node in a depth- 
first search manner from the root. Therefore, we can assign each node to its position in the 
whole equational term graph. Consequently, an equational term graph 

{Xi I Xi = ti, X2 = t2,...} 




For the case of an unrooted graph, it can be rooted by choosing an arbitrary starting node. It may also 
have several other distinct connected components, which might be represented by a set of equational term 
graphs. 
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can be normalised to an "a-normal form" in which for each x = t, the bomid variable x is 
renamed to the position of t in the whole term as 

{e I e = t'l, 1 = 4,...} 



(see Example 7.2). We identify an equation term graph with its a-normal form. 

Proposition 7.1. EGraph forms a Ti-algebra. The unique homomorphism |— ] : 

T EGraph is monomorphic, giving an interpretation of a cyclic sharing term as a 

graph represented by an equational term graph. 

Proof. We define an algebra structure on EGraph of a-normal forms as follows. 
^EGraph(p)({^ | Gi},...,{6 | G„}) = {e | e = f{l, . . . ,n), G[, . . . , G'J 

where {1 | G'l} = shifti({e | d}) ■ ■ ■ {n \ G'„} = shift„({e | Gn}) 
ptrEG-ph(r)(/ptO = {e I ^ = ^m 

shiftjje I e = ti, I = t2, . . ■} = {i \ i = shifti(ti), i.l = shiftj(t2), . . •} 
shiftj(p) = i.p for a position p 
shiftj(/(xi,...,x„)) = f{i.xi,...,i.x„) 
^p-\x - 1 if a; > 1 



The function shiftj shifts every bound variable by a position i G N (i.e. appending i as 
prefix) in a term to form an equational term graph suitably. Then, it is obvious that |— ] is 
monomorphic and that it gives a translation from cyclic sharing terms to equational term 
graphs. n 

Notice that |— ] : T >- EGraph is not an isomorphism. Equational term graphs have 

much more freedom to express graphs than cyclic sharing terms. For example, although 
{x I X = x} is a valid equational term graph (the "black hole"), no corresponding cyclic 
sharing term exists. 

Example 7.2. Consider the term /ix.bin(^yi.bin(//z.bin(tx, lf(6)),^/ltyi), lf(9)) of Fig. IT] 
This is represented as the following term in de Bruijn and is interpreted as an equational 
term graph: 

bin(bin(bin(t3,lf(6)),^ltl), lf(9)) 

{e I e =bin(l,2), 12 =11, 112 = lf(6), 

U 1 =bin(ll,12), 111 =e, 2 = lf(9), 

11 =bin(lll,112)}. 

8. Further Connections to Other Works 

The semantics of cyclic sharing terms by equational term graphs opens connections to 

other semantics as T ► EGraph ► 5, where S is any of the following semantics of 

equational term graphs, 
(i) letrec-expressions: an equational term graph is obviously seen as a letrec-expressionl 



letrec-expressions are more expressive than equational term graphs because they can express multiple 
roots by putting a tuple {x\, . . . ,Xn) of roots of distinct connected components in the body of a letrec- 
expression |Has97| . 
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(ii) Domain-theoretic semantics: mentioned below. 

(iii) Categorical semantics in terms of traced symmetric monoidal categories |Has97j . 
(iv) Coalgebraic semantics: a graph is regarded as a coalgebraic structure that produces 
every node information along its edges, e.g. |AAMV03] . 

The domain-theoretic semantics of letrec-expressions or systems of recursive equations (e.g. 
|CKV74) ). is now standard; it gives infinite expansion of cyclic sharing structures. Via 
equational term graphs, we can interpret our cyclic sharing terms in each of these semantics. 
Each semantics has its own advantage and principles related to some aspects of cyclic sharing 
structures. However, none of these has focused on our goals, which are the following. [l] 
A simple term syntax that admits structural induction. In] Direct usability in functional 
programming, as described in the Introduction. Therefore, we have chosen the initial algebra 
approach to cyclic sharing structures. 

Although insufficient, the above semantics (m) and ([il| are close to our goals in the fol- 
lowing way. Consider the cyclic sharing term iJ:X.b\n{fiyi.b\n{fiz.h\n{'lx, lf(6)),i/ltyi), lf(9)) 
of Fig. [TJ As considered in Example 7.2, this is interpreted as an equational term graph: 



{e I e =bin(l,2), 12 =11, 112 = lf(6) 

I =bin(ll,12). 111 =e, 2 = lf(9) 

II = bin(lll,112)}. 

Using domain-theoretic semantics, we can obtain its expansion as an infinite term 

bin(bin(bin(--- ,lf(6)), bin(- • • , lf(6))), lf(9)) (8.1) 

where each "• • • " is actually an infinitary long that repeats the whole term. This is regarded 
as an expansion of the structure in which each pointer node "^/pf^" is connected directly 
to the referred node. 

Defining this idea in a lazy functional language based on domain-theoretic semantics 
such as Haskell yields another interesting representation related to the use of internal pointer 
structures. Let's consider this in Haskell. Let the type HTree be a lazy datatype of trees 
defined by 

t ::= If(fc) I bin(ti,t2) 
(but here, for simplicity, we retain mathematical notation rather than Haskell). Conse- 
quently, we define the translation function trans : EGraph HTree from equational term 

graphs to HTree by 

trans({yi | yi = n, . . . ,y„ = r„}) = let (x) = {f)[y ^ x] in xi (8.2) 

where vectors denote sequences, and [— i— t- — ] a substitution function of variables (written 
in Haskell). At the level of Haskell, this gives a translation into internal pointer structures 
in the heap memory of an implementation, because a let-expression (which is theoretically 
letrec) generates a pointer structure as presented in Fig. [5] because of the graph reduction 



mechanism of Haskell. Printing it will generate an infinite term as Eq. (8.1). In this 
way, starting from T via equational term graphs, our cyclic sharing terms can be used as 
"blueprints" of pointer structures in the memory. 

A problem in the pointer structures is lack of structural induction. Exactly how it is 
possible to compose and decompose the pointer structures cleanly at the level of Haskell 
programming remains unclear. Therefore, this approach was thought to be somewhat in- 
sufficient for our goals, but this approach is nevertheless efficient and interesting. 
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9. Conclusion 

We have given an initial algebra characterisation of cyclic sharing structures and derived 
inductive datatypes, structural recursion, and structural induction on them. We have also 
associated them with equational term graphs in the initial algebra framework. Hence we 
have shown that various ordinary semantics of cyclic sharing structures are applied equally 
to them. 

From a programming perspective, practicality of our datatype of cyclic sharing struc- 
tures must still be investigated. A possible direction of future work is to seriously use a 
dependently-typed programming language such as Coq and Agda for programming with 
cyclic sharing structures as an extension of this work. 
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